Cholera diagnosis in human stool and detection in water: A systematic review and meta-analysis

Background Cholera continues to pose a problem for low-resource, fragile and humanitarian contexts. Evidence suggests that 2.86 million cholera cases and 95,000 deaths due to cholera are reported annually. Without quick and effective diagnosis and treatment, case-fatality may be 50%. In line with the priorities of the Global Task Force on Cholera Control, we undertook a systematic review and meta-analysis of diagnostic test accuracy and other test characteristics of current tests for cholera detection in stool and water. Methods We searched 11 bibliographic and grey literature databases. Data was extracted on test sensitivity, specificity and other product information. Meta-analyses of sensitivity and specificity were conducted for tests reported in three or more studies. Where fewer studies reported a test, estimates were summarised through narrative synthesis. Risk of Bias was assessed using QUADAS-2. Results Searches identified 6,637 records; 41 studies reporting on 28 tests were included. Twenty-two tests had both sensitivities and specificities reported above 95% by at least one study, but there was, overall, wide variation in reported diagnostic accuracy across studies. For the three tests where meta-analyses were possible the highest sensitivity meta-estimate was found in the Cholera Screen test (98.6%, CI: 94.7%-99.7%) and the highest specificity meta-estimate in the Crystal VC on enriched samples (98.3%, CI: 92.8%-99.6%). There was a general lack of evidence regarding field use of tests, but where presented this indicated trends for lower diagnostic accuracy in field settings, with lesser-trained staff, and without the additional process of sample enrichment. Where reported, mean test turnaround times ranged from over 50% to 130% longer than manufacturer’s specification. Most studies had a low to unclear risk of bias. Conclusions Currently available Rapid Diagnostic Tests can potentially provide high diagnostic and detection capability for cholera. However, stronger evidence is required regarding the conditions required to secure these levels of accuracy in field use, particularly in low-resource settings. Registration PROSPERO (CRD42016048428).

RDTs potentially provide a cheap, accurate, quick, easy to use, and robust diagnostic tool [10,11]. Over the past three decades a number of such tests have been developed and validated-e.g., Crystal VC, and the Institut Pasteur (IP) dipstick-in field settings including Bangladesh, Guatemala, Mexico, and Mozambique [11]. A previous review on the topic carried out in 2012 by Dick et al. [11] identified 24 cholera diagnostic tests, including RDTs, PCR technologies, agglutination, and direct fluorescence antibodies. Turnaround time of these tests was as little as 15 minutes. However, diagnostic accuracy of these RDTs for individual patients was variable; reported sensitivities ranged from 58-100%, and specificities from 60-100%. Additionally, the quality of the 18 peer-reviewed articles included in the review was found to be low, with issues surrounding sample size and sample types, the context of field-tests, and gold standards [11]. More recently, two reviews of methods for detecting cholera have been published [13,14]. These both cover laboratory tests and field-based RDTs and focus on the technical mechanisms by which these different tests work. Ramamurthy and colleagues [13] particularly highlight new methods such as loop-mediated isothermal amplification (LAMP) with the use of a lateral chromatographic flow dipstick and use of genome sequencing data, which show promising results for detection of cholera-however research on these in field settings is currently limited.
While these three reviews provide a scope of the field of cholera diagnostic tests, they are not systematic reviews, and did not review the literature using rigorous search methods, nor do they undertake any meta-analysis. Finally, critical information on product design, pricing, ease of use and training requirements were missing from these reviews-this information is highly pertinent given the low-resource settings in which these products are most needed.
Evidence surrounding accurate cholera diagnosis, and in particular rapid diagnostic tests, remains highly topical, with recent reviews suggesting that such tests still see limited use for either surveillance or outcome detection [15]. This study aimed to appraise the evidence of diagnostic accuracy and other features relevant to use in low-income settings (such as pricing and design features) of current cholera diagnosis and detection tests for use with water or stool samples. This analysis is clearly relevant in the assessment of the suitability of current diagnostic tests for the wider use in such settings required to meet roadmap goals. Further, noting the emergence of novel diagnostic technologies [16,17], it is also relevant in informing the target product profile required of any new product proposed for use at scale in low-resource contexts in this field.
Searches were undertaken by one reviewer (JF) and screened by two reviewers (JF and KD). Discrepancies were resolved through discussion, and by an arbiter (JM) where no consensus could be reached.

Inclusion and exclusion criteria
Study inclusion was determined according to the following criteria: Population. People suspected to be infected with cholera. Index test. Diagnostic tests developed for rapid use with field samples. Target condition. Detection of V. cholerae in human stool and water. Reference test. Culture or PCR, or a combination reference including one of these. Setting. Field or laboratory setting.
Outcome. Sensitivity and specificity. We included primary field and laboratory evaluations of any study design that compared a test for cholera to a reference test, validated using field samples of water or human stool. We excluded studies which used only artificially created cholera samples and studies without a non-cholera control-i.e., that only included samples positive for cholera. We also excluded abstracts and articles with insufficient information on our review objective, non-research reports, opinions, editorials, and modelling studies.
No restrictions were placed upon publication date, language, or location of study.

Data extraction and analysis
In line with the protocol, analysis of studies included descriptions of: • Diagnostic accuracy of the products-e.g., sensitivity and specificity • Technical characteristics of products-e.g., detection target and turnaround time • Information on product pricing and ease of use Positive and negative predictive values (PPV and NPV) were intended for inclusion as per the study protocol, however, due to inconsistencies in reporting PPV and NPV, we focussed solely on sensitivity and specificity.
Information was therefore extracted from papers on: study characteristics; product specifications of diagnosis and detection technologies; sample characteristics, preparation, and handling; outcome measures including sensitivity, specificity, true positives, false positives, true negatives, false negatives; and data on test pricing, design characteristics, and ease of use. Data extraction was performed in duplicate using an extraction sheet designed and piloted prior to study selection. One reviewer (JF) extracted all papers, with second extraction done by a team of three reviewers (KD, FO'M and AG). Discrepancies were resolved by an independent arbiter (IV).
Study quality and bias were assessed using the QUADAS-2 tool for studies of diagnostic accuracy [19]. Risk of bias was assessed by two reviewers (JF and KD). QUADAS-2 questions were focused on assessing both risk of bias and applicability and in the case of laboratory studies were adapted to include consideration of both patient and sample selection. Where no information was presented at all or insufficient information was available to reach judgment, we noted answers to QUADAS-2 as unclear. Where information was available, and studies used good practice in line with other studies of diagnostic accuracy, and concerns over test applicability were not present, we judged risk of bias and concerns over applicability as low.
To reach judgments on applicability, we considered all relevant test characteristics discussed in the document review-i.e., both issues of diagnostic accuracy, but also intended use of test as described by authors, including relevant information on technical specifications and cost of test among others. No formal assessment of publication bias was conducted.

Meta-analysis.
Meta-analyses of sensitivity and specificity were undertaken according to the methods outlined in Shim et al., 2019 [20]. Meta-analyses were carried out where data was available for three or more studies testing on the same sample type (i.e., stool or water), with the same sample handling (i.e., direct versus enriched samples). Raw numbers of true positives, false positives, true negatives, and false negatives were required, so studies without this information were excluded from meta-analysis. A separate meta-analysis was carried out for each reference test where these criteria were met. A random effects model was used to account for variation across studies, and forest plots were produced to provide a visual depiction of variability. To avoid sample overlap, only one estimate of sensitivity and one estimate of specificity was included per study in each meta-analysis. The one exception to this was where studies reported separate estimates by geographical location. Where studies had more than one estimate calculated based on the same samples (e.g., due to lab technicians and field technicians both undertaking the test), priority was given to results obtained from settings most similar to that intended by the test.
Due to the correlation of sensitivity and specificity estimates, additional analyses were undertaken to overcome this and provide single meta-estimates of diagnostic accuracy.
Summary receiver operating characteristic (SROC) curves of sensitivity against false positive rate (false positive rate = 1-specificity) were plotted and the area under the curve (AUC) calculated. The AUC varies from 0 to 1 and estimates the percentage of correct predictions of a test, with a value of 0 representing a test whose diagnoses are 100% wrong, 1 representing a test whose diagnoses are 100% correct, and 0.5 representing a test with a 50% chance of a correct diagnosis.
Additional supplementary meta-analyses were undertaken using diagnostic odds ratios (DOR). Further details can be found in S3 Appendix.
Narrative synthesis. Given meta-analysis was not possible for the majority of tests, sensitivity and specificity results were also synthesized narratively, by presenting a range of estimates for each test, and plotting sensitivities and specificities graphically. Tests were sorted into three groups for narrative synthesis: immunologically-based tests, PCR-based tests, and 'other' test types. Results of studies were sub-grouped by intended location of test, sample type, target, and whether the sample was enriched prior to testing. For each of these groups the range of sensitivities and specificities reported in studies is detailed. Tests were classified as laboratory evaluations or field evaluations according to a) the location where the test was undertaken and b) the personnel who undertook the test (for example field technicians or clinicians, versus lab technicians). This classification was undertaken by reviewers based on the descriptions of test procedures available in the included texts, and the assessment of this review may deviate from the 'field' or 'laboratory' label used by the authors of a study.
Information on other components of the diagnostic products was also synthesized narratively, within the same three groups. This means that where available we extracted and report on the information study authors provided regarding the way in which products are due to be used (including ease of use, necessary training of health care workers, instructions for use) and their potential value for money (e.g., elements of cost, efficiency of deployment).
Meta-analysis was undertaken in R version 3.6.3, and all other analysis was undertaken in Microsoft Excel.

Search results
Searches identified 6,637 records. Once duplicates were removed, the titles and abstracts of 4,163 records were screened for relevance. Full text review was undertaken on 181 papers, and 35 were selected for analysis. The search process is detailed in Fig 1, including the reasons for exclusion of 146 records during the full text assessment.
During search updates in March 2020, a further 602 records were retrieved (after exclusion of duplicates with original search), and four additional studies identified for inclusion. Two further studies were identified through reference lists of included studies, resulting in 41 studies in total being included in the final analysis.

Characteristics of included studies
The 41 studies included 13 field assessments and 31 laboratory assessments of cholera diagnostic products (four studies included both field and laboratory assessments; one study was unclear on the location). Samples came from a range of countries, primarily in South and East Asia, the most reported being Bangladesh (13 studies) and India (eight studies). Twenty-eight different products were reported on. These included immunologically-based tests detecting lipopolysaccharides or proteins of V. cholerae (for example Crystal VC, Cholera SMART, and Cholera Screen), and PCR-based tests detecting genes or nucleic acids (for example TaqMan Array Card). Most studies utilised stool samples in their testing, however five studies using water samples were included. Reference tests were overwhelmingly bacterial culture (in 33 studies), with PCR (in six studies), or combination references (three studies) also used. Overall, 24,835 samples were captured, with individual study sample sizes ranging from 27 to 6,497. Complete study characteristics are presented in Table 1.
The 28 different diagnostic tests and detection products reported included several different types of test. For the purposes of analysis, these were split into three broad categories, based on the mechanism of action of the diagnostic test: 'immunologically-based' tests detecting lipopolysaccharides or proteins, PCR-based tests, and 'other tests'-which included selective media-based tests and real-time cell analysis. Immunologically-based tests are those detecting antigens of V. cholerae O1 and O139, such as lipopolysaccharides or proteins (reported in studies in a variety of ways, for instance 'V. cholerae O1 antigen A' [57], ''A' factor of V. cholerae O139' [52], 'V. cholerae O1 and O139 antigens' [34]). These tests were predominantly intended for field use (17 of 20 tests). PCR-based tests are those detecting genes and nucleic acids associated with pathogenic V. cholerae through PCR, for instance detecting the toxR gene [44], or espM gene [46] of V. cholerae. Finally, other types of tests included selective media-based tests, and real-time cell analysis. In selective media-based tests, stool samples containing bacterial organisms-in this case V. cholerae-were grown on selective medium and preliminary identified on the basis of colony appearance [31]. The single cholera-toxin real-time cell analysis test included assesses how different mammalian cell types respond to cholera toxin as they grow [41].

Results: Meta-analysis
Three tests had sufficient data to undertake meta-analysis: Crystal VC, Cholera Screen, and IP dipstick. For Crystal VC separate meta-analyses were carried out for samples tested directly ("direct samples") and samples enriched in Alkaline Peptone Water (APW) prior to testing ("enriched samples"); for IP dipstick, only direct samples could be included in the meta-analysis, as there was insufficient comparable data on enriched samples; for Cholera Screen all samples were tested directly. Table 2 reports a summary of results obtained from the meta-analyses and the SROC analysis.
For tests on direct samples, Cholera Screen showed the highest sensitivity meta-estimate of the analysed tests, at 98.6% (95% CI: 94.7-99.7), with the lowest sensitivity meta-estimate reported in Crystal VC. Similarly, the latter test also had the lowest specificity meta-estimate, reported at 77.7% (CI 70.7-83.3), with the highest noted for the IP dipstick test.
Relating to tests used on enriched samples, only data from the Crystal VC is available. Considered alongside the other tests, the sensitivity meta-estimate is the lowest overall at 85.5% (68.1-94.4), however the specificity meta-estimate is highest overall at 98.3% (92.8-99.6).
Heterogeneity in sensitivity and specificity was seen across studies for each of the four meta-analyses, as can be seen from the ranges reported in Table 2. The highest variation was seen in the specificity of the Cholera Screen test, where the lowest reported specificity was 22.2% (6.4-47.6%) [25] and the highest 100.0% (93.3-100.0%) [39]. Forest plots providing a visual representation of the variation in sensitivity and specificity across tests and studies can be found in S3 Appendix.
The area under the curve estimates are high (greater than 0.8) across all tests, with the IP dipstick showing the highest AUC of 0.969. SROC curves plotting the sensitivity against false positive rate are reported in S3 Appendix.

Results: Narrative synthesis of sensitivity and specificity
Given meta-analysis was not possible for the majority of tests, a narrative synthesis was also undertaken to capture diagnostic accuracy results. Tests were split into three categories: immunologically-based tests, PCR-based tests, and other test-types. We present the findings of the narrative analysis in the forthcoming section. Table 3 presents a summary of findings of the diagnostic accuracy for each test. A full breakdown of the sensitivity and specificity results of individual studies can be found in S4 Appendix. Additionally, a full dataset of extracted and back-calculated values of sensitivity, specificity, true positives, false positives, true negatives and false negatives for each study can be found in S5 Appendix.
Immunologically-based tests detecting lipopolysaccharides or proteins. Twenty immunologically-based tests, with a total of 35 sub-groups, were included in the narrative synthesis.
Sensitivity. Of those tests intended for field use, the most frequently studied test-the Crystal VC-had reported sensitivities ranging from 65.6% (testing directly on stool samples with bacterial culture reference [1]) to 98.9% (on enriched stool samples with bacterial culture reference [24]). Several tests had reported sensitivity of 100% (nine tests, 12 sub-groups): the Cholera SMART, Cholera Screen, IP dipstick, two-tip dipstick ELISA, Vch-UPT-LF, Cholera     [42,49,59] and 97% in two studies on enriched samples [22,59]. Of those tests intended for laboratory use, the bead ELISA and dot-blot ELISA were reported in more than one study, although on different samples from the same study groups. The highest sensitivities were found for the Cholera DFA [35] and dot-blot ELISA [27]. Chaicumpa 1998 [27] reported higher sensitivities for the dot-blot ELISA when stool samples were enriched versus directly tested (100% compared to 63%).
Additionally, some tests had 100% specificities reported by one study, with much lower specificities reported by others: whilst one study (Islam 1994 [39]) reported specificity of 100% in the Cholera Screen test, notably low specificities were reported by two other studies: 42.9%  (Hasan 1994a [35]). However, these results where only found in single studies.
PCR-based tests. All but one of the five PCR tests were assessed by only one study, which limits interpretation. However, four of five studies reported sensitivity of 100%. In Albert 1997 [21], the PCR assay with new primers O139-1 and O139-2 had a sensitivity of 94%. In the Multiplex PCR, where two studies reported results, Hoshino 1998 [38] reported sensitivity of 100% on enriched samples, whereas Sayeed 2018 [56] reported sensitivity of 73.6% on direct samples (where sensitivity was estimated using Bayesian latent class modelling).
The TaqMan Array Card was assessed against both a conventional assay and PCR Luminex reference, and was found to have sensitivity and specificity of 100% in both instances [44].
Other test types: Selective media-based tests and real-time cell analysis. One study investigated selective media-based tests-the ChromID Vibrio, and Thiosulfate-Citrate-Bile Salts-Sucrose (TCBS) agar [31]. In both tests, direct stool samples had lower sensitivities than enriched samples (79% versus 100%, respectively). ChromID Vibrio appeared to be more specific than TCBS for both enriched and directly tested samples (100% versus 50%).
One study reported on a cholera-toxin real-time cell analysis, reporting sensitivity and specificity of 90% and 100%, respectively [41].

Results: Other characteristics of included tests
Price. Price was not well reported across diagnostic tests, with information only available for four of 21 immunologically-based tests and one of five PCR-based tests. Price was reported by multiple papers for Crystal VC, and one paper each for Cholera SMART, the Medicos dipstick, and the SD Bioline. Of these, Crystal VC was the cheapest at approximately USD $1.90, per test [1,32,34,43] and Cholera SMART the most expensive at USD $14 per test [42]. In contrast, the one PCR test with price available was considerably more expensive: the TaqMan Array Card was USD $60 per card [44]. However, it was unclear from the text how many samples a single TaqMan Array Card could process.
Test time. Test time was similarly more comprehensively reported across immunologically-based tests than PCR-based tests. Immunologically-based tests ranged in testing time from two minutes (Pathogen Detection Kit [23]) or less than five minutes (Cholera Screen [29,39] and BengalScreen [37]) to 3.5 hours (Bead ELISA [53]). Tests intended for field use reported times between two minutes and less than two hours [23,57], and tests intended for laboratory detection reported times from less than 30 minutes to 3.5 hours [35,53]. The twominute estimate for the Pathogen Detection Kit was for samples that were clinically deemed to have a high probability of cholera [23].

PCR-based tests generally did not specify turnaround time, with the exception of the Multiplex PCR [38] which took approximately five hours.
Ease of use and training. No information on ease of use or training was provided for PCR-based tests, however given these were all laboratory-based we can assume technical skill was required. Multiple studies reported ease of use for a number of different immunologicallybased tests, however few specifics were given beyond 'simple' or 'easy to use', or 'easy to perform' [29,37]. The exception to this was Kalluri 2006 [42] reporting on the Cholera SMART: while Hasan 1994b [36] described the test as simple and easy to use, users in Kalluri 2006 [42] reported that "the SMART device was often difficult to interpret and was frustrating to use".
Other test features-Storage, internal quality control, result capturing. Where reported, immunologically-based tests displayed results as coloured lines or spots on the device [39,43,45,52], and PCR-based tests as bands on a gel electrophoresis or plate [21,38]. All studies including information about quality control included some sort of positive and/or negative control included in the test. Information on storage was not well reported, with information only available for six tests.
Further details regarding other characteristics of included tests can be found in Table 4.

Risk of bias and applicability
Risk of bias was assessed using the QUADAS-2 framework [19]. Sample selection was deemed low to unclear risk of bias across studies; studies assessed as unclear were graded so due to a paucity of information on selection in the record. All studies providing information on sample selection were assessed as low risk.
Risk of bias in the interpretation of the index test was assessed as high in 13 studies. In all of these, this relates to applicability concerns, as the intended location of the test did not match the location in which the study evaluated that test, as assessed by reviewers. The remaining studies were rated unclear, where information was missing, or low. Only eight studies specified that blinding was used for interpretation of test results, and one study specified that it was not used. The remaining studies were unclear.
Risk of bias in the test conduct and interpretation of the reference test was assessed as low in all but three studies. In Hao 2017 [33], Jin 2013 [41] and Eddabra 2011 [31], risk of bias was graded unclear, due to the use of complex combination references, in which two of three different test methods were required to be positive. While the remainder of the studies were graded low risk, the majority used a bacterial culture reference. The low grade was deemed appropriate as bacterial culture is considered the gold standard in cholera diagnosis; however, it has recognised limitations due to its low sensitivity (as low as 70.8% reported in Sayeed 2018 [56]). Five studies specified that reference tests were undertaken in a blinded manner, with the remaining studies being unclear.
Sample flow refers to whether samples received the same reference standard, and whether all samples were included in the analysis. While all studies used reference standards consistently across samples, three studies (Debes 2016 [30], Liu 2013 [44], Matias 2017 [45]), did not include all samples in their analysis, and did not report reasons for this exclusion.
The results of the risk of bias and applicability assessment are shown in Fig 2. A full table of results for individual studies is available in S6 Appendix.

Discussion
This review found 41 studies reporting on 28 different tests for cholera diagnosis and detection. The majority of these tests were immunologically-based and intended for field settings. Diagnostic accuracy of different tests appeared broadly similar, with 22 tests having both , conversely other users reported that "the SMART device was often difficult to interpret and was frustrating to use" [42]. No or little training required [23,36], although offered in one study [42].   [50]. One study [50] reports that training users had no impact on test sensitivity; specificity was lower in untrained users, however difference was not statistically significant.
Stable between 4-30˚C and in humid conditions.
Storage at room temperature [22,42]  sensitivities and specificities above 95% reported by at least one study. However, accuracy was difficult to compare directly due to variations in sample handling and setting in which tests were assessed. Additionally, low sample sizes limited the validity of some assessments, particularly in those 10 studies where sample size was less than 100. When interpreting sensitivity and specificity results it is critical to recognise that these are paired outcomes, which tend to be inversely correlated [60]. For this reason, statistical methods such as calculation area under the curve (AUC) and diagnostic odds ratios (DOR) are preferred, though less intuitive to interpret.

Meta-analysis
Area under the curve (AUC) estimates were all above 0.8, with two tests having estimates above 0.95 (Cholera Screen and IP dipstick), demonstrating these tests provided correct diagnoses over 95% of the time. However, the inclusion of only a small number of studies prompts caution over AUC results [61], particularly in the meta-analyses of the Cholera Screen and IP dipstick tests, where only four and three data points were included, respectively. Additionally, a high area under the curve can occur even with studies with very high specificities but low sensitivities (e.g., Carillo 1994 [25] in the Cholera Screen meta-analysis, reporting a sensitivity of only 22%). DOR results can be found in S3 Appendix.

Narrative synthesis: Factors affecting sensitivity and specificity
The narrative synthesis completed for all reported products indicated that multiple tests had at least one study reporting both sensitivity and specificity of over 90%. For example, for fieldbased tests on stool samples with no enrichment step, this included nine tests (BengalScreen, Bengal DFA, Bengal SMART, Cholera Screen, Cholkit, Cholera SMART, Crystal VC, IP dipstick and SD Bioline). However, wide variation was seen across studies, and several factors prompt caution over interpretation of sensitivity and specificity results. While bacterial culture is considered the gold standard in cholera diagnostics due to its high specificity, sensitivity is reportedly low [10,50,56]. This creates a situation whereby the index test may be more sensitive than the reference standard, leading to an underestimation of index test specificity; thus, accuracy of index tests assessed only against bacterial culture should be interpreted with this in mind. A couple of solutions have been used to overcome this. First, using combination references, such as culture alongside PCR [48,50]. Second, use of Bayesian latent class analysis, which considers prior information regarding accuracy of bacterial culture (as used by Sayeed 2018 [56] and Page 2012 [50]).
The majority of tests reviewed are intended for use in the field, in cholera outbreak situations. However, the studies often assessed tests in alternate settings, such as in a lab using field samples [28,45]. There was some evidence that studies that did assess tests in 'real' field settings found lower sensitivities and specificities than those using alternate settings. For example, for the Cholera SMART test, Kalluri 2006 [42] reported sensitivity of 58% during a field setting whereas Bolaños 2004 [23] reported sensitivity of 100% during laboratory assessment. However, Hasan 1994b [36], assessing the Cholera SMART in a field trial undertaken at a research centre (International Centre for Diarrhoeal Disease Research, Bangladesh) reported sensitivities of 95.6-100%. Moreover, this pattern is not seen across all tests: the Crystal VC results show no clear association between studies in contexts that match intended use, and those that do not. Ley 2012 [43] explicitly compared performance of the Crystal VC in the laboratory and the field, using the same samples, finding sensitivity and specificity of 90% and 55.6%, respectively, in the field, and 87.5% and 74.1%, respectively, in the laboratory. However, this difference was not statistically significant. Mukherjee 2010 [47] reports that the specific conditions of the field may also be impactful: during monsoon season, when V. cholerae cases were more prevalent, sensitivity and specificity of Crystal VC was 100% and 87.3%, compared to 88% and 61% during the post-monsoon and winter season when V. cholerae cases were less prevalent.
Finally, the skill level of the tester may affect test performance: Kalluri 2006 [42] found a sensitivity and specificity for Cholera SMART of 58% and 95%, respectively, when the test was undertaken by field technicians, but 83% and 88%, when undertaken by lab technicians. The study by Page 2012 [50] assessed performance of Crystal VC when undertaken by laboratory technicians and clinicians. Using Bayesian analysis, sensitivity and specificity of 93% and 85.3% was reported when undertaken by laboratory technicians, compared with 93.8% and 78.4% when undertaken by clinicians. However, these differences in specificity were not statistically significant [50].

Narrative synthesis: Other product characteristics
Other characteristics of tests were generally poorly reported on in studies. However, a limited number of studies did report on factors such as cost, turnaround time, and ease of use-major factors which affect how tests perform in practice. Given the similarity in diagnostic accuracy results of many of the tests, it is the intended setting of a product and the product's other technical or usability characteristics that are likely to drive decisions over which test is most appropriate in a given situation. For example, if diagnostic test cost is the ultimate factor affecting utility of a test in developing countries (as suggested by Kalluri 2006 [42]), the Crystal VC (costing $1.90 per test), or the SD Bioline (at approximately €2 per test) are the cheapestalthough, any sample enrichment required will somewhat increase this cost. If turnaround time is prioritised, the Cholera Screen and the Pathogen Detection Kit have the fastest reported times, both at under 5 minutes [23,29,39].
Sample enrichment additionally affects turnaround time. The estimates reported in Table 4 for test time do not include enrichment time, despite numerous studies using samples enriched in Alkaline Peptone Water (APW) prior to testing [10,24,30,32,40]. Enrichment time varied from four hours in APW (Crystal VC [10]; two-tip dipstick ELISA [58]; Cholera diagnostic kit [57]; IP dipstick [22]; SD Bioline [48]) to 24 hours in APW (Crystal VC [28,30]), which significantly increases the turnaround time of RDTs for which it may be required. Additionally, reported test-time estimates were manufacturer specified, rather than assessed by the independent evaluators. Kalluri 2006 [42] assessed the manufacturer's specification versus actual field time taken for three tests and reported that: Cholera SMART had a 10-15 minute specification whereas actual field time ranged from five to 40 minutes (mean 19 minutes); the IP dipstick had a 10-minute specification but took between three and 58 minutes in the field (mean 16 minutes); and the Medicos dipstick had a 10 minute specification but in practice took between seven and 54 minutes (mean 23 minutes).
There was insufficient evaluation around ease of use and training requirements to draw out which tests were considered the most useable. While many studies briefly described tests as 'simple' or 'easy to use', it was not clear whether this information came from independent evaluation or manufacturer specifications. When usability was evaluated in studies, results appeared more mixed-for example with untrained users of the Crystal VC reporting difficulties differentiating O1 and O139 test lines [50], and laboratory and field technicians using Cholera SMART reporting that the device was "often difficult to interpret and was frustrating to use" [42].

Results in the context of previous reviews
This review-using a systematic review methodology-confirms the breadth of RDTs available for cholera detection with acceptable sensitivity and specificity suggested by previous non-systematic reviews [11,13,14]. However, the range of scores achieved for the same tests in different studies and contexts of use reinforces concerns regarding small sample sizes raised by Dick et al. [11] and the lack of field evaluation of kits indicated by both Ramamurthy et al. [13] and Dick et al. [11].

Limitations
To our knowledge, this is the first systematic review and meta-analysis on products for the diagnosis and detection of cholera to have been undertaken. Eleven databases of published and grey literature were searched, along with reference lists of identified studies, to capture as many relevant studies as possible. Despite this, some papers may have been missed in searches and not have been included. Additionally, given our search of predominantly English language databases, and problems accessing papers in alternate languages, our results show an Anglophone bias. As noted, data extraction from-and interpretation of-many studies was constrained by lack of detail in reports (including failure to comply with the Standards for Reporting of Diagnostic Accuracy (STARD) guidelines [62]